Development and validation of a hypertension risk prediction model and construction of a risk score in a Canadian population

Identifying high-risk individuals for targeted intervention may prevent or delay hypertension onset. We developed a hypertension risk prediction model and subsequent risk sore among the Canadian population using measures readily available in a primary care setting. A Canadian cohort of 18,322 participants aged 35–69 years without hypertension at baseline was followed for hypertension incidence, and 625 new hypertension cases were reported. At a 2:1 ratio, the sample was randomly divided into derivation and validation sets. In the derivation sample, a Cox proportional hazard model was used to develop the model, and the model's performance was evaluated in the validation sample. Finally, a risk score table was created incorporating regression coefficients from the model. The multivariable Cox model identified age, body mass index, systolic blood pressure, diabetes, total physical activity time, and cardiovascular disease as significant risk factors (p < 0.05) of hypertension incidence. The variable sex was forced to enter the final model. Some interaction terms were identified as significant but were excluded due to their lack of incremental predictive capacity. Our model showed good discrimination (Harrel’s C-statistic 0.77) and calibration (Grønnesby and Borgan test, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi^{2}$$\end{document}χ2 statistic = 8.75, p = 0.07; calibration slope 1.006). A point-based score for the risks of developing hypertension was presented after 2-, 3-, 5-, and 6 years of observation. This simple, practical prediction score can reliably identify Canadian adults at high risk of developing incident hypertension in the primary care setting and facilitate discussions on modifying this risk most effectively.

Selection of candidate variables. Before commencing the analysis, we compiled a list of available potential candidate variables. We determine the possible candidate variables for inclusion in model development based on a literature search 4,24 , variables that have been used in the past 25 , and discussion with content experts. For this study, we considered 29 candidate variables for inclusion in the model. We deliberately did not consider genetic risk factors/biomarkers as potential candidate variables given our model's intended clinical application. Inclusion of the genetic risk factors in the model can reduce the model's usability due to a lack of readily available information.
Definition of outcome and variables. The outcome incident hypertension was determined from linked administrative health data using a coding algorithm. We used the relevant ICD-9 and ICD-10 codes (ICD-9-CM codes: 401.x, 402.x, 403.x, 404.x, and 405.x; ICD-10-CA/CCI codes: I10.x, I11.x, I12.x, I13.x, and I15.x) and a validated hypertension case definition (two physician claims within two years or one hospital discharge for hypertension) to define hypertension incidence (sensitivity 75%, positive predictive value 81%) 26 .
Out of 29 candidate variables, 11 were continuous, and 18 were categorical. Continuous variables remained continuous in the model developed and categorized only for deriving risk scores. A detailed description of the variables and their categorization is provided in the supplementary material (Appendix 2).
Missing values. Our dataset has missing values on several candidate variables ranging from 0 to 26%. Information on missing values for different candidate variables is presented in the supplementary table (Table S1). We used multiple imputation for missing data 27 . This technique predicts the missing values by utilizing the existing information from other available variables 28 and then substitute the missing values with the predicted values to create a complete dataset. Multiple imputation by chained equations (MICE) was used to impute the missing values using Stata's "ice" command 29  www.nature.com/scientificreports/ Statistical analysis. Before imputing missing values, the required assumption "missing at random" for performing multiple imputations was checked. We compared the study characteristics of those with missing with those without missing information using appropriate tests (unpaired t-test or the χ 2 -test). Continuous variables were expressed as the mean (SE), and categorical variables were expressed as numbers (percentage of the total). We randomly split subjects into two sets: the derivation set, which included 67% (two-thirds) of the sample (n = 12,233), and the validation set, which included the remaining 33% (one-third) (n = 6089). The two groups' baseline characteristics were compared using the unpaired t-test or the χ 2 -test, as appropriate. We developed a risk prediction model from the derivation data using the multivariable Cox proportional hazards model and assessed the goodness of fit using the validation data.
Collinearity among the variables was tested using the variance inflation factor (VIF) with a threshold of 2.5 30 . From the list of candidate variables, highly correlated variables were excluded based on VIF before applying the model.
The univariate Cox proportional hazards model was applied first to screen the variables for a significant association (p < 0.20) 31 with hypertension incidence in the derivation set. Variables identified as significant in the univariate association were later put into a multivariable Cox proportional hazards model to determine ultimate significant risk factors (p < 0.05) of incident hypertension. The interaction terms were also tested, with significant variables identified in the multivariable Cox model. During the model development process, the proportional hazard assumption associated with the Cox model was also tested. There are several methods for verifying proportionality assumption, and we tested the proportionality assumption by using the Schoenfeld and scaled Schoenfeld residuals. We tested the proportionality of the model as a whole and proportionality for each predictor.
The following general equation was used to calculate the risk of incident hypertension within time t: where S 0 (t) is the baseline survival function, assuming all variables are represented by average values at followup time t ; β i is the estimated regression coefficient of the i th variable; X i is the value of the i th variable; X i is the corresponding mean, and p denotes the number of variables.
In the validation data, the model's predictive performance was assessed. Model discrimination was evaluated using Harrell's C-statistic 32 . Harrel's C-statistic indicates the proportion of all pairs of subjects that can be ordered such that the subject who survived longer will have the higher predicted survival time than the subjects who survived shorter, assuming that these subject pairs are selected at random. Calibration was assessed using the Grønnesby and Borgan (GB) test 33 . The GB test is an overall goodness-of-fit test for the Cox proportional hazards model and is based on martingale residuals. In the GB test, the observations are divided into K groups according to their estimated risk score, an approach similar to Hosmer and Lemeshow goodness-of-fit for logistic regression 34 . Brier score was calculated at different time points, and a calibration plot was also used for assessing calibration. In a calibration plot, expected probabilities (predicted probabilities from the model) are plotted against observed outcome probabilities (calculated by Kaplan-Meier estimates). Arjas like plots were used for assessing the goodness of fit graphically 35 . We also produced histograms of the prognostic index (a linear predictor of the Cox model) to show the prognostic index distribution in the derivation and validation data set. We also assessed calibration using the approach proposed by Royston P 36 , where observed (Kaplan-Meier) and predicted survival probabilities compared in some prognostic groups derived by placing cut points on the prognostic index. We defined three risk groups (good, intermediate, and poor) from the 25th and 75th centiles of the prognostic index in the derivation dataset based on events.
We then created a point-based scoring system from the model so that it can be easily used in clinical practice. Integer points were assigned according to the presence/absence of each risk factor so that the overall risk can be estimated by summing the points together. We constructed the risk score utilizing the regression coefficients of our Cox model according to the method proposed by Sullivan et al. 37 . To facilitate calculating risk score, continuous variables considered in the model development were divided into categories as discussed before.
All statistical tests were two-sided. All statistical analyses were performed using Stata (Version 15.1; Stata Corporation, College Station, Texas 77845, USA).
Comparing existing model performances to the developed model. We used a few existing hypertension risk models in our dataset to explain how our developed model performed in comparison to those models when those were applied to our population. Model selection was primarily made based on the availability of the final variables considered in those selected models in our dataset. This eliminated the majority of existing models from consideration for validation in our data set. In addition, whether the model provided enough information to perform the validation was also considered a major factor in selecting a model. For example, if a model did not provide regression coefficients or hazard or odds ratios from which coefficients can be derived were excluded from consideration. Also, if a model did not provide the predictive performance of their models, such as discrimination or calibration, they were excluded from considerations. In this case, we won't be able to compare the validated model's predictive performance in our dataset. Considering the aforementioned factors and information from our recent systematic review 38 , we selected the models by Parikh et al. 15 , Kivimӓki et al. 39 , Lim et al. 10 , Chien et al. 14 , and Wang et al. 12 for validation in our dataset. The model by Parikh et al. 15 , also known as the Framingham Risk Score (FRS), was developed in the United States in a predominantly White population, with age, sex, systolic blood pressure (SBP), diastolic blood pressure (DBP), BMI, parental hypertension, cigarette smoking, and age by DBP as final variables. Kivimӓki et al. 39 developed the Whitehall II Risk Score in England in a predominantly White population. In model construction, the same FRS variables were used. Lim et al. 10 developed their model in Korea in an Asian population using the same variables as FRS. Chien et al. 14 developed Patient consent. Not required. The manuscript is based on the analysis of secondary de-identified data.
Patients and the public were not involved in the development, design, conduct or reporting of the study.

Results
Baseline characteristics of the study participants are presented in Table 1 and supplementary table (Table S2). In Table 1, the study participants' characteristics are given for the entire cohort as well as compared according to the status of developing hypertension. In contrast, in Table S2, characteristics are compared between the derivation sample and the validation sample. Overall, the study participants' mean age was 50.99 years, and the participation of females (68.55%) in the study was higher than the males (31.45%). During the median 5.8-year follow-up, 625 (3.41%) participants developed hypertension. In Table 1, most of the study characteristics were significantly different (p < 0.05) between those who developed hypertension and those who did not. Those who developed hypertension were relatively older, had higher (average) BMI, DBP, SBP, and more with diabetes and cardiovascular disease. The proportions of males and females were also significantly different between these two groups. However, some study characteristics were similar with no statistically significant difference (p > 0.05), including ethnicity, family history of hypertension, alcohol consumption, and total physical activity time. When we randomly divided the data into derivation and validation sets (Table S2), the study characteristics were similar with no significant difference (p < 0.05) between the derivation and validation sample except BMI waist ratio.
From the list of candidate variables, six (ever smoked, hip circumference, body fat percentage, BMI waist ratio, waist circumference, diastolic blood pressure.) were excluded from the model building due to their high collinearity (threshold VIF > 2.5) with other variables. Comparing the study characteristics between the missing and imputed is presented in the supplementary table (Table S3).
In the derivation sample, most of the candidate variables used in our study were identified as significant univariate predictors (Table 2). Variables not significantly associated with incident hypertension in univariate models were excluded from the multivariable model. In the multivariable model, age, sex, BMI, SBP, diabetes, CVD, total physical activity time, depression, waist-hip ratio, residence, highest education level completed, working status, total household income, family history of hypertension, smoking status, total sleep time, vegetable and fruit consumption, and job schedule was included. The multivariable Cox model indicated that age, BMI, SBP, diabetes, total physical activity time, and cardiovascular disease were independent risk factors of incident hypertension ( Table 2). We forced sex into the model, considering its clinical importance. The following interaction terms were added to the model with other significant variables in the multivariable Cox model: age by BMI, age by SBP, age by diabetes, age by CVD, age by total physical activity time, age by sex, BMI by sex, SBP by sex, diabetes by sex, CVD by sex, and total physical activity time by sex. When the interaction terms were included in the model, age by sex, age by BMI, age by SBP, age by total physical activity time, sex by SBP, and sex by CVD showed significant association with incident hypertension (Table 3). However, the inclusion of these interaction terms did not improve the models' discriminative performance. The models with and without interaction terms were virtually identical regarding their Harrel's C-statistics value (0.77 and 0.77, respectively) and statistical significance (p = 0.64). Consequently, the interaction terms were excluded from the finally selected model. The model with only main effects was used in subsequent analyses to construct a simpler and more user-friendly risk estimation equation and risk score. A global test for Cox proportional hazards assumption indicated no violation of assumptions (p = 0.72) (Supplementary Table S4). The baseline survival function at median follow-up time 5.80-years ≈ 6-years, S 0 (6) was (0.977). In the derivation sample, the model's discriminative performance (Harrel's C-statistic) was 0.77.
When we applied our derived model in the validation sample, the model's discriminative performance was good (Harrel's C-statistic 0.77). The results of the GB test indicated an acceptable calibration of the risk prediction model ( χ 2 statistic 8.75, p = 0.07, Fig. 1). To compare the observed and expected events in each group based on risk score, Arjas like plots are also presented (Fig. 2). A calibration plot of our prediction model at a time of 6-years was also presented in Fig. 3. A calibration slope of 1.006 indicates that predicted probabilities do not vary enough 40 . Figure 4 represents the calibration of our model in the derivation and validation datasets. The calibration of the model looks good in each dataset. The predictions in the validation dataset are good for both "Good" and "Intermediate" risk groups where survival and predicted probabilities are quite similar, except slightly higher predictions between 6-and 14-years time intervals for the "Intermediate" group. The predictions in the "Poor" group are consistent with the survival up to year six and somewhat high later; that is, survival tends to be worse than predicted. Due to fewer validation data events, the confidence intervals tend to be wider in validation data than in the derivation data. Figure 5 presents the prognostic index histogram in derivation and validation data, and no obvious irregularities and outliers were detected. Brier score calculated at 4-year, 5-year, 6-year, and 7-year time points are 0.018, 0.021, 0.026, and 0.029, respectively indicating accurate predictions.
Finally, from the developed model, a simple and practical risk score was created to calculate the risk of incident hypertension at different times (2-year, 3-year, 5-year, and 6-year) ( Table 4). The constant for the points system or the number of regression units that will correspond to one point was set as the risk associated with a  ), we used the 1st percentile and the 99th percentile of that variable to minimize the influence of extreme values. The points were initially computed as a decimal value, but later rounded to the nearest integer for facile calculation. The approximate risk of incident hypertension was then estimated via summation of the points awarded to each of the items. We attach the risks associated with each point total using the Cox regression equation (Table 5). Finally, we created risk categories according to the total points. In our model, the maximum total point is 40, and the minimum is − 2. For simple interpretation in a clinical setting, we categorize estimated risk into three categories and presented in Table 6. The risk estimate based on our newly developed Cox model is computed as follows:  Existing models' performances in our dataset. We compared the models' predictive performance using the most commonly reported predictive performance metric, the C-statistic. Table 7 shows the C-statistics from the original model and the C-statistics when the models were applied to our dataset. All models' performances were lower in our population than their original predictive performance. Figure 6 compares our model's predictive performance (C-statistic) to the five validated models. Our model's better predictive performance was observed, which supports the creation of our new prediction model for the Canadian population.

Discussion
In this large prospective cohort study, we developed a simple model to predict the risk of developing hypertension incidence in Canadian adults. The variables included in our model (age, sex, SBP, BMI, diabetes, cardiovascular disease, and self-reported total physical activity time) are routinely and easily assessed in the primary-care clinical setting. Our prediction model for hypertension risk had very good discrimination and calibration for both the derivation and validation samples, suggesting that this model has good performance and may perform well when applied to a different Canadian population. Also, a risk score table was derived for clinical implementation Unadjusted and adjusted hazard ratios and 95% confidence intervals for the risk factors of hypertension incidence The predictive performance of our model was similar to other studies. Although prediction models' performance varies considerably across studies, our recent meta-analysis on the predictive performance of hypertension risk prediction models indicates an overall pooled C-statistic of 0.75 [95% CI: 0.73-0.77] 41 , which justifies our model's good predictive performance. Framingham hypertension risk score 15 , the most validated hypertension risk prediction model, had a C-statistic of 0.78, similar to our model. Our model's calibration was also right on several performance measures.

Observed Expected
Grønnesby and Borgan (GB) test (4) statistic = 8.75, p = 0.07  Fig. S1). The variable sex was not identified as a significant factor in our model, but we forced it into the model considering its clinical implication 42 . Diabetes and CVD were the two significant risk factors in our model, often excluded by many studies. Individuals who have diabetes or CVD have a higher risk of developing hypertension than those free of these conditions. Our risk prediction model aimed to identify the risk factors for hypertension in adults but excluding people with diabetes and CVD would limit our results' generalizability. To develop   Fig. S1). In our study, these risk factors were not identified as significant. Their inclusion in the model also did not change the model's discriminative performance (Harrel's C-statistic remains the same as 0.77). We identified total physical activity time significantly contributes to our model. This finding is significant because exercise is considered a preventive factor for hypertension incidence supported by scientific evidence 43 . Moreover, it is a highly modifiable lifestyle factor, and physical activity changes can modify the status of hypertension incidence.
We assessed interaction effects in our model, and several of the interaction terms were identified as significant. However, inclusion of interaction terms in the model did not improve the model's predictive performance. Our focus was on generating a simple and user-friendly risk scoring algorithm avoiding complexity. As a result, the interaction terms were excluded from the model in final considerations.
To our knowledge, this is the first hypertension risk prediction model developed explicitly in a Canadian population. The model was created using a large sample size, and the estimates from our prediction models were found to be stable, as demonstrated in the internal validation. Further, consideration of many candidate variables in model building is also a strength of this study. In contrast to most studies, where models were developed in www.nature.com/scientificreports/ complete cases, excluding those with missing values, we imputed missing values in our study. This approach prevented information loss, maximized information utilization, and made the results robust. We could have used an existing model and evaluated its performance through external validation in our dataset before creating a new risk score. However, we refrain from doing this for the following reasons: First, when applied to new individuals, a prediction model typically performs worse than it did with its original study population 18,44 . When a low predictive accuracy is discovered after an external validation study, researchers must decide whether to reject the model or update it to improve its predictive accuracy. By combining information captured in the original model with information from new individuals from the validation study, the model can be updated or recalibrated for local circumstances 18 . Model updating entails adding more predictors or altering a portion of the formula to better suit the external population 18 . The appropriateness of model updating during external validation is a point of contention among researchers. Some claim that the researchers are developing a new prediction model even with minor changes 18,45,46 . Second, developing a new prediction model along with externally validating a well-known existing prediction model in the development cohort and concluding that the new model performs better is an inappropriate comparison in our view. Because this is then comparing the performance of one model in development to the performance of another model in external validation 18 . The newly developed model will almost always appear superior because it is optimally designed to fit the development data 18 . The performance of two existing prediction models should be directly compared in an external validation dataset that is independent of both model development cohorts. Given this, we did not evaluate an existing model's performance and then develop a new model on the same dataset. Nevertheless, for the purpose of comparison, we assessed a few of the published hypertension risk prediction models in our population and found that their performance was inferior to ours.
Our study has several limitations. Study participants were middle-aged Canadians. Prevention strategies are likely to be more effective if the young population can be targeted. Nevertheless, our study participants' age range will likely have minimal impact on our study's generalizability, as essential hypertension develops in the middle aged adults 47 , as represented here. At baseline, we excluded participants with self-reported hypertension, which can potentially lead to misclassification of hypertension status. The incidence rate of hypertension in our Table 4. Calculation of point values for risk score *Reference category. The age range in the sample is 35-70. a The range of body mass index is 12.5-64.9. To determine the reference values for the first and last categories, we use the 1st percentile (18.5) and the 99th percentile (42.7) to minimize extreme values' influence. **The range of physical activity total is from 33 MET minutes/week to 19,278 MET minutes/week. To determine the reference values for the first and last categories, we use the 1st percentile (99) and the 99th percentile (13,518) to minimize extreme values' influence. b The range of systolic blood pressures is 76-205. To determine the reference values for the first and last categories, we use the 1st percentile (92) and the 99th percentile (156) to minimize extreme values' influence. The constant for the points system or the number of regression units will correspond to one point. Here, we let B reflect the increase in risk associated with a 5-year increase in age: B = 5(0.02768) = 0.1384.    48 . There can be several potential reasons for that. The characteristics of the study participants in ATP may be different from the general Alberta population. For example, female participation in ATP data was more than double the male participation (69% vs. 31%), and the hypertension incidence rate in Alberta was much lower in females than the males in study age groups 48 . A potential selection bias also may lead to a lower incidence rate of hypertension in our study The participants in ATP were mainly selected using the volunteer sampling method 49 . Those who decided to join the study (i.e., who self-select into the survey) may have a different characteristic (e.g., healthier) than the non-participants. Due to the longitudinal nature of the study, there can also be a loss of study participants during follow-up. Participants who were lost to follow-up (e.g., due to emigration out of the province) may be more likely to develop hypertension. Our study ascertained outcome hypertension from a linked administrative health data (the hospital discharge abstract or physician claims data source) due to a lack of longitudinal data in ATP. There is a possibility that the outcome ascertainment was incomplete as we did not have measured blood pressure to verify. Also, people who did not have a healthcare encounter after cohort enrollment (e.g., did not visit a family physician/general practitioner or were not admitted to the hospital during the study period) were missed and can potentially lead to a lower hypertension incidence. We did not account for competing risks in our study because the expected event (death) rate is low as the cohort was healthy and relatively young at inception with a short follow-up time. We did not include genetic risk factors or biomarkers in our model. The inclusion of genetic risk factors in the model has the potential of improving risk prediction. However, our recent meta-analysis on hypertension risk prediction models 41 and previous studies 11 did not show any differences in discriminative performance (pooled C-statistic was 0.76 for models developed using genetic risk factors/biomarkers). In addition, the inclusion of genetic risk factors in the model may decrease the prediction model's application in routine clinical practice. Sodium intake is an important dietary factor for the risk of incident hypertension; however, in our study, sodium intake data were not available. We could not perform an external validation of our model, essential for any prediction model's generalizability. Therefore, further validation of our model in other populations, particularly in another Canadian jurisdiction, is warranted. A direct comparison of our Table 7. The predictive performance of some of the past published hypertension prediction models in our dataset. www.nature.com/scientificreports/ models' performance with other models on the same dataset will allow us to properly understand the quality of our new model, allowing for a head-to-head comparison of predictive performance between models. The direct comparison will show which model performs best, which will help guide future research and clinical practice.

Model
In the future, we plan to compare our model to other relevant models on a separate dataset.
In conclusion, we have developed a simple yet practical prediction model to estimate the risk of incident hypertension for the Canadian population. Risk assessment tools are believed to be convenient in motivating high-risk individuals for future health problems to modify their lifestyles to decrease their risks. Once the model is validated via external validation studies, it can help identify individuals at higher risk of hypertension, increase health consciousness, motivate individuals to improve their lifestyles and prevent or delay the onset of hypertension.

Data availability
The data that support the findings of this study are available from Alberta's Tomorrow Project (ATP) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Alberta's Tomorrow Project (ATP).